There are 9 variables including “state,” “date,” “fips,” “cases,” “deaths,” “geo_id,” “population,” “pop_density,” and “abb” with a total of 58,094 observations. The first 5 observations are from Alabama with number of deaths ranging from 213 to 21,400. The last 5 observations are from Wyoming, with number of deaths ranging from 596 to 1,884.
state date fips cases deaths geo_id population pop_density abb
1029 Alabama 2020-03-13 1 6 0 1 4887871 96.50939 AL
597 Alabama 2020-03-14 1 12 0 1 4887871 96.50939 AL
282 Alabama 2020-03-15 1 23 0 1 4887871 96.50939 AL
12 Alabama 2020-03-16 1 29 0 1 4887871 96.50939 AL
266 Alabama 2020-03-17 1 39 0 1 4887871 96.50939 AL
78 Alabama 2020-03-18 1 51 0 1 4887871 96.50939 AL
summary(cv_states)
state date fips cases
Washington : 1158 Min. :2020-01-21 Min. : 1.00 Min. : 1
Illinois : 1155 1st Qu.:2020-12-06 1st Qu.:16.00 1st Qu.: 112125
California : 1154 Median :2021-09-11 Median :29.00 Median : 418120
Arizona : 1153 Mean :2021-09-10 Mean :29.78 Mean : 947941
Massachusetts: 1147 3rd Qu.:2022-06-17 3rd Qu.:44.00 3rd Qu.: 1134318
Wisconsin : 1143 Max. :2023-03-23 Max. :72.00 Max. :12169158
(Other) :51184
deaths geo_id population pop_density
Min. : 0 Min. : 1.00 Min. : 577737 Min. : 1.292
1st Qu.: 1598 1st Qu.:16.00 1st Qu.: 1805832 1st Qu.: 43.659
Median : 5901 Median :29.00 Median : 4468402 Median : 107.860
Mean : 12553 Mean :29.78 Mean : 6397965 Mean : 423.031
3rd Qu.: 15952 3rd Qu.:44.00 3rd Qu.: 7535591 3rd Qu.: 229.511
Max. :104277 Max. :72.00 Max. :39557045 Max. :11490.120
NA's :1106
abb
WA : 1158
IL : 1155
CA : 1154
AZ : 1153
MA : 1147
WI : 1143
(Other):51184
min(cv_states$date)
[1] "2020-01-21"
max(cv_states$date)
[1] "2023-03-23"
The date range is from 2020-01-21 to 2023-03-23. The range of cases are from 1 to 12,169,158. The range of deaths is 0 to 104,277.
Add new_cases and new_deaths and correct outliers
library(ggplot2)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:data.table':
between, first, last
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(zoo)
Attaching package: 'zoo'
The following objects are masked from 'package:data.table':
yearmon, yearqtr
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
library(plotly)plot_cases <- cv_states_today %>%plot_ly(x =~population, y =~cases, type ='scatter', mode ='markers', color =~state,size =~population, sizes =c(5, 70), marker =list(sizemode='diameter', opacity=0.5)) %>%layout(title ="Population vs. Cases",xaxis =list(title ="Population"),yaxis =list(title ="Cases"))plot_cases
Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
cv_states_today_filter <- cv_states_today %>%filter(state !="District of Columbia")plot_cases_filtered <- cv_states_today_filter %>%plot_ly(x =~population, y =~cases, type ='scatter', mode ='markers', color =~state,size =~population, sizes =c(5, 70), marker =list(sizemode='diameter', opacity=0.5)) %>%layout(title ="Population vs. Cases (Filtered)",xaxis =list(title ="Population"),yaxis =list(title ="Cases"))plot_cases_filtered
Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
plot_deaths_per100k <- cv_states_today_filter %>%plot_ly(x =~population, y =~deathsper100k, type ='scatter', mode ='markers', color =~state,size =~population, sizes =c(5, 70), marker =list(sizemode='diameter', opacity=0.5)) %>%layout(title ="Population vs. Deaths per 100k",xaxis =list(title ="Population"),yaxis =list(title ="Deaths per 100k"))plot_deaths_per100k
Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
plot_hover <- cv_states_today_filter %>%plot_ly(x =~population, y =~deathsper100k, type ='scatter', mode ='markers', color =~state,size =~population, sizes =c(5, 70), marker =list(sizemode='diameter', opacity=0.5),hoverinfo ='text',text =~paste(state, "<br>Cases per 100k:", per100k, "<br>Deaths per 100k:", deathsper100k)) %>%layout(title ="Population-normalized COVID-19 deaths (per 100k) vs. Population",xaxis =list(title ="Population"),yaxis =list(title ="Deaths per 100k"),hovermode ="compare")plot_hover
Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Explore scatterplot trend interactively using ggplotly and geom_smooth
library(ggplot2)library(plotly)cv_states_today_scatter <- cv_states_today_filterp <-ggplot(cv_states_today_scatter, aes(x = population, y = newdeathsper100k, size = population)) +geom_point(alpha =0.6) +geom_smooth(method ="lm", color ="blue") +labs(title ="Population Density vs New Deaths per 100k",x ="Population Density",y ="New Deaths per 100k")ggplotly(p)
`geom_smooth()` using formula = 'y ~ x'
Warning: The following aesthetics were dropped during statistical transformation: size.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
The number of new deaths per 100k seem to decrease with increasing population density. It seems that there may be a negative correlation.
Multiple Line Chart
plot_naive_cfr <-plot_ly(cv_states, x =~date, y =~naive_CFR, color =~state, type ="scatter", mode ="lines") %>%layout(title ="Naive Case Fatality Rate (CFR) Over Time",xaxis =list(title ="Date"),yaxis =list(title ="Naive CFR (%)"))plot_naive_cfr
Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
plot_florida <- cv_states %>%filter(state =="Florida") %>%plot_ly(x =~date, y =~new_cases, name ="New Cases", type ="scatter", mode ="lines", line =list(color ="blue")) %>%add_trace(y =~new_deaths, name ="New Deaths", mode ="lines", line =list(color ="red")) %>%layout(title ="New Cases and New Deaths in Florida",xaxis =list(title ="Date"),yaxis =list(title ="Count"),hovermode ="x unified")plot_florida
The states that increased in september included South Carolina, Florida, and Oklahoma. Over the next 2 months they seemed to gradually increase. in naive_cfr. The time difference between the peak cases and deaths in Florida was about half a month
California, New York, Texas, and Washington DC stood out from the heatmap of new cases. When repeated with newper100k, Rhode Island stood out the most.
Map
library(dplyr)library(plotly)pick.date <-"2021-10-15"cv_per100_date <- cv_states %>%filter(date == pick.date) %>%select(state, abb, newper100k, cases, deaths) %>%mutate(state_name = state,state = abb,hover =paste(state_name, '<br>', "Cases per 100k: ", newper100k, '<br>', "Cases: ", cases, '<br>', "Deaths: ", deaths) )cv_per100_today <- cv_states_today %>%select(state, abb, newper100k, cases, deaths) %>%mutate(state_name = state,state = abb,hover =paste(state_name, '<br>', "Cases per 100k: ", newper100k, '<br>', "Cases: ", cases, '<br>', "Deaths: ", deaths) )set_map_details <-list(scope ='usa',projection =list(type ='albers usa'),showlakes =TRUE,lakecolor =toRGB('white'))shadeLimit <-125fig_pick_date <-plot_geo(cv_per100_date, locationmode ='USA-states') %>%add_trace(z =~newper100k, text =~hover, locations =~state,color =~newper100k, colors ='Purples' ) %>%colorbar(title =paste0("Cases per 100k: ", pick.date), limits =c(0, shadeLimit)) %>%layout(title =paste('Cases per 100k by State as of ', pick.date, '<br>(Hover for value)'),geo = set_map_details )fig_today <-plot_geo(cv_per100_today, locationmode ='USA-states') %>%add_trace(z =~newper100k, text =~hover, locations =~state,color =~newper100k, colors ='Purples' ) %>%colorbar(title =paste0("Cases per 100k: ", Sys.Date()), limits =c(0, shadeLimit)) %>%layout(title =paste('Cases per 100k by State as of ', Sys.Date(), '<br>(Hover for value)'),geo = set_map_details )subplot(fig_pick_date, fig_today, nrows =2, margin =0.05)
The CFR has changed from some medium to dark purple spots on the map from 10/15/2021 to nearly all white on today’s date